System and method for providing a searchable library of electronic documents to a user

ABSTRACT

A method and system for publishing a plurality of books for user access to information includes selecting a plurality of books, converting each book from a publisher&#39;s digital form, e.g., by training a tool to detect characteristic features (such as layout, typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or data information of the book tagged with the features. This produces a searchable library database arranged, for example, as an xml database indexed by book structure such that a user may remotely, over the internet or other network, access the database, search desired content, and view an image of a portion of the book with the desired data. The system includes a user registration module to identify an authorized user, and may maintain a personal bookshelf for the user. A search engine may score search results based on their position in the hierarchy or other factors, determining degree of relevance of text or data information located by the search engine. The other factors may include position of located search data in the hierarchy, identification of search data in the user&#39;s personal library or in a prior search by the user, or degree of match of data identified in the search. An interface with a commercially available search engine may operate to adapt the search. When provided a search query by a user, it may search for an exact match and score hits for relevance, and in the event an exact match is not found, operate to expand the query and return hits in order of rank together with an indication of the expanded search. The user may thus ascertain a degree of likely relevance of returned text or data information. The relational database may include hyperlinks to section headings and related data passages, such that a user accessing a page of a book may immediately view related data and context of a page. The relational database is indexed by logical subunits of the book such that expanded searches for Boolean combinations or proximity of elements span page breaks of book text to identify all instances of the desired search data. The search engine may expand a search if all hits have low ranking, and may suppress hits of low ranking when the search produces hits of high ranking. In further embodiments, the search engine may search tables, drawings and formulae of the converted book file.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to and claims the priority under 35 U.S.C. 119(e) of U.S. Provisional Patent Application Ser. No. 60/170,038 filed Dec. 10, 1999. That application is hereby incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

The invention relates generally to a system and method for providing a searchable library of electronic documents to a user and more specifically to a system and method for providing a searchable library of electronic books which can be accessed by a user over a global computer network such as the Internet.

Book publishers publish many books only in hard copy form. In many instances these hard copy books are not available to individuals until the books reach retail stores or are available for purchase through mail order or on-line services. In order for an individual to have access to the contents of the hard copy book, the individual must either go to a retail store and purchase the book or order the book and wait for the book to be delivered. In both instances the book is not immediately available to the individual upon the individual desiring the book. In certain instances this causes a problem for an individual who requires instant information at time of day when retail bookstores are not open for business.

In addition, in order for an individual to determine if a book in hard copy form contains desired information, the individual must read the back cover, a summary of the book or scan the table of contents. This is a time consuming process and the information desired by the individual may still be located in the book, even if it is not identified in the summary or in the table of contents. There is no automatic mechanism for searching for key words in the text of a hard copy book. Further, many books published on the same topic may contain different information which is valuable to the individual. In many instances it is prohibitively expensive for an individual to purchase all the relevant books, and new books are constantly being published.

What is desired then is a system and method for providing a searchable library of electronic documents which is accessible to an individual at any time. What is also desired is a system and method for publishing a hard copy document in an electronic form which may be searched by an individual. The present invention permits such functionality.

SUMMARY OF THE INVENTION

The invention relates to a system and method for providing a searchable library of electronic documents to a user. The electronic library system includes an electronic document database, a search module, a user database, a user verification module, a web site server, a communication network and at least one user computer. The electronic library system is used for providing a searchable library of electronic documents to a user. The library of electronic documents is made accessible to a user upon the user obtaining a subscription to the electronic document library service. In one embodiment, the user pays a monthly fee in order to access documents stored in the electronic document library. The electronic documents may be books, magazines, or any other document which is in electronic form. In one embodiment, the electronic books are technical and computer books. The electronic documents may be accessed by a user computer by accessing the electronic document library's web site through a communication network. In one embodiment the communication network is a global computer network such as the Internet.

The invention also relates to a method for searching the electronic document database. In one embodiment, the user enters a key word query to search for documents in the electronic document database. The search module searches the document database in response to the query entered by the user. In one embodiment, the search module retrieves all the documents which satisfy the requirements of the key word query. In another embodiment, the search module retrieves the documents which are most relevant to the user's request. In yet another embodiment, the search module expands the search parameters in order to locate relevant documents.

The invention further relates to a system and method for publishing a hard copy document in an electronic form which may be searched by a user. In one embodiment, the electronic library system includes electronic document conversion computers which convert documents from hard copy form into electronic form. In one such embodiment, the electronic document conversion computers convert documents which are in an electronic form used by publishers to print hard copy documents into a form for electronic publication. The electronic form of documents used by publishers may be files from layout programs such as PageMaker, QuarkExpress, FrameMaker, or any other publishing program.

The present invention has the advantage of providing a searchable library of electronic documents which is accessible to a user at any time of the day. The invention has the further advantage of providing a library containing a quantity of documents which the user may not otherwise have access to and which is searchable by a user through key word searches. The invention therefore allows a user to have access to information at anytime that information is required and provides the user with a convenient and time efficient method for searching for the documents which are the most relevant to the user's needs.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is pointed out with particularity in the appended claims. The above and further advantages of this invention may be better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic block diagram of an embodiment of an electronic document library system according to the invention;

FIG. 2 is a flowchart illustrating the steps performed in an embodiment of the invention for a user to gain access to the documents stored in the electronic document library;

FIG. 3 is a diagram of the options available to a user through an embodiment of a main menu of the electronic document library system of the invention;

FIG. 4 is a pictorial view of an embodiment of a screen display which is presented to a user upon accessing the web site of the electronic document library;

FIG. 5 is a flowchart illustrating the steps performed by a user to gain access to the documents in the electronic document library;

FIG. 6 is a pictorial view of an embodiment of a login screen display;

FIG. 7 is a pictorial view of an embodiment of a registration screen display;

FIG. 8 is a pictorial view of an embodiment of a search screen display;

FIG. 9 is a pictorial view of an embodiment of a sub-topic list screen display;

FIG. 10 is a pictorial view of an embodiment of a search results screen display;

FIG. 11A is a pictorial view of another embodiment of a search results screen display;

FIGS. 11B and 11C are a pictorial view of an embodiment of a screen displaying the contents of an electronic document;

FIG. 12 is a pictorial view of another embodiment of a screen displaying the contents of an electronic document;

FIG. 13 is a pictorial view of an embodiment of menu presented to a user to navigate the contents of an electronic document;

FIG. 14 is a flowchart illustrating the steps performed by the search module to return the most relevant documents in response to a user's query;

FIG. 15 is a block diagram illustrating the hierarchy of elements contained in typical hard copy books;

FIG. 16 is a flowchart illustrating the steps performed by the search module to improve the relevancy of the search results returned to a user;

FIG. 17 is a pictorial view of an embodiment a virtual bookshelf;

FIG. 18 is a pictorial view of an embodiment of a screen display presented to a user to create a virtual bookmark;

FIG. 19 is a pictorial view of a screen display showing the manifest created by an embodiment of the tool for converting a publisher's file to an electronic document file; and

FIG. 20 is pictorial view of a screen display showing the list of tables created by an embodiment of the tool for converting a publisher's file to an electronic document file.

Like reference characters in the respective drawn figures indicate corresponding parts.

DETAILED DESCRIPTION OF THE INVENTION

In broad overview, and referring to FIG. 1, an embodiment of an electronic document library system 10 according to the present invention includes an electronic document database 12, a search module 14, a user database 16, a user verification module 18, a web site server 20, a communication network 22 and at least one user computer 24. The electronic document library system 10 is used for providing a searchable library of electronic documents to a user. The library of electronic documents 26 is made accessible to a user upon the user obtaining a subscription to the electronic document library service. In one embodiment, the user pays a monthly fee in order to access documents stored in the electronic document library 26. In another embodiment, the user pays a yearly fee in order to gain access. In yet other embodiments, the user pays a fee based on another predetermined period of time.

In one embodiment, the documents available in the document database 12 are documents which are commercially available from publishers. In one such embodiment, the electronic document library provider obtains the right to electronically publish the documents from the publisher. The electronic document library provider provides the publisher with compensation in return for the right to electronically publish the document. In one embodiment the compensation is a flat fee. In another embodiment, the compensation is a royalty-based fee. For example, the electronic document library provider may pay each publisher a percentage of the subscription fees obtained from users. The fee may be based on the number of documents the publisher provides. In another embodiment the fee paid to the publisher is based on the number of users who access the publisher's documents. In yet another embodiment, the fee paid to the publisher is based on the number of times the publisher's documents are accessed.

In still another embodiment, the electronic document library provider negotiates with each publisher the percentage of the electronic document library provider's revenue reserved for royalties. The electronic document library provider's royalty pool size is then a blended average of the negotiated rates with each publisher. The publisher then competes based on content usage (i.e., retrievals) for a piece of their view of the royalty pool. The usage calculation is weighted: page retrievals, placing a book on a bookshelf, and the list price of the book all factor into usage points for a publisher. The electronic document library provider then divides up the royalty pool as seen by each publisher by getting a percentage of their usage points versus total usage points for all publishers. The royalty payments are made quarterly.

The document database 12 provides a central location for storing the electronic documents that may be accessed by a user. The electronic documents may be books, magazines, or any other document which is in electronic form. In one embodiment, the electronic books are technical and computer books. The electronic documents may be accessed by a user computer 24 as described in more detail below. In one embodiment, the document database 12 is maintained by a third party at the electronic document library 26. The document database 12 may be a single database or may be a plurality of databases which are in communication with each other.

The document database 12 is in communication with the search module 14. The search module 14 searches the document database 12 in response to queries entered by a user computer 24. In one embodiment the search module 14 retrieves the documents which are most relevant to the user's request. An embodiment of a process for retrieving the most relevant documents will be described in more detail below in the description of FIGS. 14-16. The search module 14 is in communication with the web site server 20. The web site server 20 generates an interactive screen display for interacting with a user computer 24. The web site server 20 is connected to the communication network 22. In one embodiment, the communication network 22 is a global computer network, such as the Internet. In another embodiment, the communication network 22 is a telephone network. In yet another embodiment, the communication network 22 is an electronic network dedicated to the communication of information between the web site server 20 and the user computer 24. The communication network 22 may be any system for communicating data between two computers.

The user computer 24 may be any computer which is capable of being connected to the communication network 22. The user computer 24 may be the user's own personal computer or any other computer or terminal which the user may access to connect to the communication network 22. For example, the user computer 24 may be a computer which the user has access to at the user's workplace. In one embodiment, the user computer 24 is not a single computer, but changes as the user accesses different computers.

The user database 18 provides a central location for storing user information. In one embodiment, the user database 18 stores a user name and a password for each user authorized to access documents stored in the document database 12. In another embodiment, the user database 18 stores personal information regarding authorized users, such as name, address, telephone number, e-mail address, occupation, etc. In another embodiment, the user database 18 stores billing information for each user. For example, the user database 18 may store information indicating that the user's personal credit card account should be charged to keep the user's subscription current. Alternatively, the user database 18 may store information indicating that the user's employer should be charged to keep the user's subscription current. The user database 18 is in communication with a user verification module 16 which receives user information from the user computer 24 through the communication network 22 and the web site server 20. The user verification module 16 compares the user identification information received from the user computer 24 to the entries in the user database 18 and determines whether a user should be granted access to the document database 12.

In one embodiment, a user is allowed to search the document database 12 without being authorized by the user verification module 16. In this embodiment, the user is verified by the user verification module 16 upon requesting to view the contents of an electronic document. In another embodiment, the user is verified by the user verification module 16 before being able to search the document database 12. In yet another embodiment, the user is able to view only certain documents in the document database 12 without being verified by the user verification module 16.

In one embodiment, the electronic library system 10 includes a registration module 28. The registration module 28 enables users who are not currently included in the user database 18 to register for the service and to be added to the user database 18. In another embodiment, the user contacts the third party maintaining the electronic library system 10 in order to register for the service and be added to the user database 18. The registration process will be described in more detail below in the discussion of FIG. 7.

In another embodiment, the electronic library system 10 includes electronic document conversion computers 30 which convert documents from hard copy form into electronic form. In one such embodiment, the electronic document conversion computers 30 convert documents which are in an electronic form used by publishers to print hard copy documents into a form for electronic publication. The electronic form of documents used by publishers may be files from layout programs such as PageMaker, QuarkExpress, FrameMaker, or any other publishing program. The process for converting books into a standard electronic form will be described in more detail below.

Referring to FIG. 2, a flowchart illustrates the series of steps performed for a user to gain access to the document database 12. At step 32, a user has access to a computer 24, such as a personal computer. The user connects the computer 24 to the communication network 22. In one embodiment, the user connects the computer 24 to a global computer network, such as the Internet, through a service provider. Next, in step 34 the user selects the web site address of the electronic document library 26. In step 36, in response to the user's selection, the web site server 20 transmits the screen display for the home page to the user computer 24 for display on the monitor of the user computer 24. The home page displays the main menu 37 from which the user may select.

The main menu 37 provides the user with a central page from which all the major functions of the system 10 can be reached. The main menu screen presents the user with the different functions which may be selected by pointing and clicking on an appropriate button or icon. FIG. 3 illustrates the options from which a user may select upon entering the main menu page. The options include: register 38, new documents 39, most popular documents 40, login 41, search documents 42, personal bookshelf 43, help 44 and feedback 45. Each of these options will be described in detail below. FIG. 4 presents one embodiment of a screen display 50 which may be presented to a user as a home page or main menu page. Referring back to FIG. 2, in step 46, the user reviews the options and selects the option of interest. In response to the users selection, the web site server 20 determines which option the user selected in step 48 and selects the appropriate page to present to the user.

If the user selects the login option 41, the web site server 20 proceeds to step 52 and displays a user login page. The flowchart of FIG. 5 illustrates the steps performed by one embodiment the present invention to login a user. As described above, in step 54 the user selects the login option 41. Next, in step 56, the web site server 20 transmits the screen display for the login page to the user computer 24. FIG. 6 is one embodiment of a login page screen display 58. The login page screen display 58 queries the user for a user name and a password. The login page screen display 58 includes a user name field 60 and a password field 62. Once the user has entered the user name and password into the login page screen display 58, the user selects the “Login” button 64 or hits a “Return/Enter” key. Upon selecting the “Login” button 64, the user computer 24 transmits the user identification information to the web site server 20 in step 66. Once the web site server 20 receives the user identification information entered into the login screen 58, in step 68, the web site server 20 transfers the user identification information to the user verification module 16. The user verification module 16 searches the user database 18 to determine if the user name is contained in the user database 18 and to determine if the entered password is valid.

If the user name and associated password are valid, the user verification module 16 grants the user access to the full contents of the documents in the electronic document database 12 in step 70. In one embodiment, if the user name is valid, but the associated password is invalid, the user verification module 16 sends an error message to the user computer 24 through the web site server 20 and requests the user to login again. If the user name is not valid, the user verification module 16 denies the user access to the electronic document database 12 in step 72. In another embodiment, the user is granted access to search the document database 12, but may not view the full contents of the electronic documents. In another embodiment, if the user is denied access in step 72, the user is given an opportunity to obtain a subscription to the electronic document library service by registering with the electronic document library 26. In one embodiment, the user may set the user's preferences to automatic login in order to avoid entering a user name and password each time the user accesses the web site server 20. In this embodiment, the user verification module 16 still verifies the user name and password, each time the user logs into the web site server 20 in order to verify that the user's subscription is still valid.

Referring again to FIG. 2, if the user selects the register option 38, the web site server 20 proceeds to step 72 and displays the registration page. FIG. 7 shows one embodiment of a registration screen 74. The registration screen 74 allows a user to request a subscription to the electronic document library 26. The registration screen 74 queries the user for user identification information, company information, telephone number information, e-mail address information and number of subscriptions requested. In another embodiment, the registration screen 74 also queries the user for billing information, such as a credit card number. The registration screen 74 includes user name fields 76, 78, a job title field 80, an e-mail address field 82, a telephone number field 84, a country field 86, a company name field 88, a department name field 90, a department size field 92 and a number of seats requested field 94. The number of seats requested field enables a user to purchase multiple subscriptions at the same time. For example, a company may wish to purchase subscriptions for several of its employees. In one embodiment, the country field 86 is a drop down menu from which a user may select a country. In another embodiment, the registration screen 74 also includes fields for the user's address and billing information, such as a credit card number and expiration date. In another embodiment, the registration screen 74, includes a field for the user to enter a preferred password. Once the user has entered all of the required information into the registration screen 74, the user selects the OK button 96 or hits the “Return/Enter” key.

After the user has entered the necessary information into the registration screen 74 and selected the OK button 96, the user computer 24 transmits the information to the web site server 20. The web site server 20 transfers the information to the registration module 28 which determines if the user should be granted a subscription. If the user is granted a subscription, the registration module 28 adds a user name and a password to the user database 18. In one embodiment, the registration module 28 uses the user name entered by the user in the user name fields 76, 78. In another embodiment the registration module 28 selects a user name for the user. In yet another embodiment, the registration module 28 uses the password entered by the user as a preferred password. In another embodiment, the registration module 28 selects a password to be associated with the user name. If the user is granted a registration, the registration module 28 contacts the user and informs the user of the user name and password to use when logging into the electronic document library 26. In one such embodiment, the registration module 28 sends an e-mail message to the user. The e-mail message contains the user name and password to be entered by the user when logging into the electronic document library 26. In another embodiment, the user is given the opportunity to change the user name and password.

Referring again to FIG. 2, if the user selects the new documents option 39, the web site server 20 proceeds to step 98 and displays information regarding documents recently added to the document database 12. If the user selects the most popular documents option 40, the web site server 20 displays information regarding documents which have been accessed by the greatest number of users or which have been accessed the greatest number of times during a predetermined period. In one embodiment, the predetermined period is the previous week.

If the user selects the search documents option 42, the web site server 20 proceeds to step 100 and displays a find books page screen. FIG. 8 shows one embodiment of a find books page screen 102. In the embodiment shown in FIG. 8, the find books screen 102 presents the user with three options for searching the document database 12 for electronic documents. In one option, the user is able to browse a topic list 104. The topic list 104 of FIG. 8 identifies topics which may be of interest to information technology and computer specialists. Other embodiments of the topic list 104 may contain topics of interest to individuals in other fields. In one embodiment, once a user selects topic from the topic list 104, the user is presented with a list of sub-topics. For example, if a user selects the “Databases” topic 106, the user is presented with the sub-topic list 108 shown in FIG. 9. The user then has the option to return to the main topic list 110, select one of the sub-topics 108 or go to related topics 112.

Once the user selects one of the sub-topics 108, the search module 14 identifies the documents categorized under the selected sub-topic and the user is presented with a list of documents which are categorized under the selected sub-topic. For example, if the user selects the “Access” sub-topic 114, the user is presented with a list of documents in the form of the screen display 116 shown in FIG. 10. The user may then select one of the documents or return to the sub-topic list 108 or main topic list 104.

Referring again to FIG. 8, in another embodiment, the user is able to search for a particular document by supplying information regarding the particular document. For example, the user may search for a document by entering words in the document's title, the author's name, the publisher or the ISBN. The user enters the document information into the lookup field 118. Once the user enters the document information, the user selects the “Lookup” button 120 or hits the “Return/Enter” key. The search module 14 then searches the document database 12 for documents which correspond to the user's query and returns a list of the relevant documents.

In another embodiment, the user is able to search for an electronic document in the document database 12 by entering key words which might occur in the text of the document. The user enters the key words into the key word field 122. Once the user has entered the key words, the user selects the “Search” button 124 or hits the “Return/Enter” key. The search module 14 then searches the document database 12 for documents which contain the key words. In one embodiment, the user is able to instruct the search module 14 to search for documents in the entire document database 12, in a particular category from the main topic list 104 or sub-topic list 108, in a results list from a previous search or in books present in the user's personal bookshelf.

In the example shown in FIG. 11A, the user has entered the key words “java jar files” in the key word field 122. The user has also selected to search all documents. The search module 14 searched the document database 12 and found 21 documents. The search module 14 presents a list of the found documents to the user. In one embodiment, the search module 14 searches the document database 12 for the documents which contain information most relevant to the user's query. An embodiment of a method for determining which documents are most relevant will be described in detail below in the discussion of FIGS. 14-16. In one such embodiment, the search module presents the results of the search in order of relevance. In one embodiment, the search module 14 indicates the relevance of documents as “High”, “Medium”, or “Low”. In still another embodiment, the search module 14 indicates the sections of the document which are most relevant. In yet another embodiment, the search module 14 presents the most relevant documents and links to the most relevant sections within these documents. In the embodiment shown in FIG. 11A, the search module 14 indicates the relevancy of different sections by using a thermometer type gauge 126. A thermometer gauge which the most filled has the highest relevancy. For example, in FIG. 11A, the section “Minimizing Applet Loading Times” 128 has a higher relevancy than the section “Using the Java Development Kit (JDK)” 130. In other embodiments, other types of level indicators are used to indicate relevancy.

From the search results list, the user may select to view an electronic document. For example, the user may select to view the book “Mastering Java 2” by clicking on the underlined title. The web site server 20 then displays the book to the user. In one embodiment, the web site server 20 displays the entire content of the book to the user, including text, graphics, tables, code listings, etc. In another embodiment, the web site server 20 displays the contents of the book just as the user would see the contents if viewing the hard copy version. Continuing the example from above and referring to FIGS. 11B and 11C, in one embodiment the web site server 20 displays a synopsis of the book 132, the table of contents of the book 134 and the information on the back cover of the book 136. In one embodiment, the web site server 20 also provides links to colleague comments about the book and public comments about the book. The web site server 20 also displays the thermometer gauges next to the most relevant sections of the book.

Referring back to FIG. 11A, the user may also select a link that leads directly to the relevant section of a document. For example, the user may select the link “Minimizing Applet Loading Times” 128 to go directly to that section of the document. FIG. 12 shows an example of a screen 137 which displays the content of the document section selected by the user. In one embodiment, the web site server highlights the key words 138 entered by the user.

FIG. 13 shows an embodiment of a menu 140 presented to the user along with the contents of the selected document. The menu 140 identifies the portion 142 of the document the user is viewing and presents the user with links to other portions of the document.

In order to understand how this embodiment of a search is performed, it is important to understand how documents are stored in the document database 12. Knowledge of the typical structure of a type of document is used when documents are added to the document database 12. When the electronic documents are stored in the document database 12, the documents are broken down into a hierarchy of constituent elements. The constituent elements are linked together in a relational scheme so that the document database 12 can identify which documents a particular element is part of. An index is created for each document which identifies the constituent elements. FIG. 15 illustrates a possible hierarchy for breaking down a book 146 into its constituent parts. Many books are composed of parts 147, each part 147 being composed of chapters 148, each chapter 148 being composed of sections 150 and each section 150 being composed of sub-sections 152. The sub-sections may contain text, tables, graphics, code listings, etc. Other documents may be composed of other elements.

FIG. 14 shows a flowchart illustrating the steps performed by the search module 14 to locate the most relevant documents in the document database 12 in response to the key words entered by the user in the key word field 122. In step 144, the search module 14 receives the search terms from the user. In step 154 the search module 14 searches the lowest level elements of the documents. For example, if the document is a book 146, the search module 14 searches the subsections 152 for the key words entered by the user. Next, in step 156, the search module ranks the relevancy of the lowest level elements located. The search module may use any of the relevancy ranking algorithms known in the art to rank the relevancy of the lowest level elements located. All of the objects within the element are included in the calculation. For example, text, graphics, tables, code listings, etc. all searched and used in the calculation. In one embodiment, the search module 14 calculates the number of times a search term appears in the element. After the relevancy of the lowest level elements has been ranked, in step 158 the search module 14 sums the relevancy rankings of the lowest level elements to determine the ranking for the next higher element. For the book example shown in FIG. 15, the search module would rank the relevancy rankings for Part₁, and then sum the relevancy rankings for the parts 147 to determine a relevancy ranking subsections 152 to determine a relevancy ranking for Section₁. Next, in step 160, the search module 14 determines if the relevancy ranking for the highest level element has been calculated. For the book example of FIG. 15, the search module 14 determines if the relevancy ranking has been calculated for the book 146. If the relevancy ranking for the highest level element has not been calculated, the search module 14 returns to step 158 and repeats steps 158 and 160 until the relevancy ranking for the highest level element is calculated. Referring again to the book example of FIG. 15, the search module 14 would sum the relevancy rankings for sections 150 to determine a relevancy ranking for Chapter₁, and then sum the relevancy rankings for the chapters 148 to determine a relevancy ranking for the book 146. Once the relevancy ranking has been determined for the highest most elements, the search module 14 returns a list of the documents with the highest ratings to the user in step 162.

In one embodiment, the search module 14 assigns a relevancy ranking based on the number of hits and the quality of the hits (i.e. exact words in key word search). In another embodiment, the search module 14 assigns a relevancy ranking to a document according to how many times the document has been accessed. In another embodiment, the search module 14 assigns a relevancy ranking according to whether the user performing the search has accessed the document previously or has stored the document on the user's personal bookshelf.

In one embodiment, the search module 14 evaluates the search results before returning the results to the user. In one such embodiment, before returning the results to the user in step 162, the search module 14 performs the steps illustrated in the flowchart of FIG. 16. In step 164, the search module 14 evaluates the search results. In certain embodiments, to evaluate the search results, the search module 14 determines whether the exact key words entered by the user appeared in the located documents, the number of times the key words appeared in the documents, whether the key words appeared in the titles of the different elements of the document or simply within the text, and the number of elements within a document the key words appeared. Based on this evaluation, in step 166 the search module 14 suppresses low ranking results. In one embodiment, the search module 14 suppresses low ranking documents. In another embodiment, the search module 14 suppresses low ranking elements within a document. Next, in step 168, the search module 14 determines whether to expand the search parameters. The search module 14 may determine to expand the search parameters if none of the documents located had a high relevancy ranking or if the number of documents located was low.

If the search module 14 determines the search results are sufficient and the search parameters do not need to be expanded, the search module 14 returns to step 154 of FIG. 14 and returns the results to the user. If the search module 14 determines that the search results are not sufficient and the search parameters should be expanded, the search module 14 proceeds to step 170 and employs fallback strategies to expand the search results.

In determining whether to implement fallback strategies, the search module 14 must balance precision and recall. Precision refers to the accuracy of the documents located. If specific key words are important, to be precise the search results must contain those key words. Recall refers to the scope of the documents located. For example, by searching for only specific key words, a user may inadvertently omit documents which are relevant. Often users are not able to formulate a search which will return the most relevant documents. A user may not think to enter certain key words. The search module 14 balances precision and recall.

In one embodiment, the search module 14 begins by searching for highly precise results, e.g., results containing the exact key words in the exact manner entered by the user. If adequate results are not returned, the search module 14 works toward high recall. The search module 14 simulates the type of searching performed by information librarians. The strategies for expanding the search results may include stemming the key words entered by the user, implementing “near” operators, and implementing “and” and “or” operators. If the user has entered more than one key word, the search module 14 implements “near” operators by searching for the key words within a certain distance within from each other rather than searching for a phrase containing the key words directly next to each other. To implement the “and” operator, the search module 14 looks for both key words being in the document rather than being directly next to each other. To implement the “or” operator, the search module 14 searches for documents containing at least one of the key words entered by the user. In different embodiments the search module 14 employs different strategies for expanding the search and employs the strategies in different orders. The purpose of expanding the search is to provide the user with the optimum results.

Once the search module 14 determines a strategy for expanding the search parameters, the search module 14 returns to step 154 of FIG. 14 and repeats steps 154-168 until the search module 14 determines not to expand the search parameters. In one embodiment, when repeating step 156, the ranking assigned to a document element is dependent upon the search performed. For example, a document containing an exact phrase entered by a user receives a higher relevancy ranking than a document containing each word of the phrase in separate parts of the document.

In one embodiment, along with returning the search results, the search module 14 displays the actual search conducted to locate the search results. For example, if the user entered the key words “java jar files” and the search module 14 returned documents containing the words “java” or “jar” or “file”, the search module would inform the user that the search conducted was: java or jar or file.

In addition to searching for documents, a user may create a personal virtual “bookshelf” of documents. The bookshelf contains documents that the user accesses frequently or desires to be able to access without searching the document database 12. Referring again to FIG. 11B, upon accessing an electronic document, a user is presented with the option of adding the document to the user's personal bookshelf through an “Add to Bookshelf” option 172. Upon selecting the “Add to Bookshelf” option 172, the document is added to the user's personal bookshelf. Referring to FIG. 12, when viewing a portion of a document, the user is presented with an option of creating a virtual bookmark to that portion.

When the user selects the personal bookshelf option 43 from the main menu, the web site server 20 displays a list of the documents in the user's personal bookshelf and a list of the bookmarks created by the user (step 176 in FIG. 2). FIG. 17 shows an example of a screen displaying a document 180 and a bookmark 182 stored on the user's bookshelf. In one embodiment, when the create bookmark option 174 is selected by the user, in addition to simply adding a bookmark to the user's bookshelf, the web site server 20 presents the user with a several options for personalizing the bookmark. FIG. 18 shows one embodiment of a screen display 184 for allowing a user to personalize a bookmark. The screen 184 includes a bookmark note field 186 into which the user may enter a comment about the document. In one embodiment, the screen 184 also allows the user to keep the bookmark private or share the bookmark with others. In another embodiment, the screen 186 allows a user to select an e-mail option 188. If the user selects the e-mail option 188, an e-mail notification is sent to the individuals identified by the user informing the individuals of the bookmark.

As mentioned above, in one embodiment, the electronic document library system 10 includes electronic document conversion computers 30 for creating digital versions of existing paper-based books which are aggregated, categorized and added to the document database 12. Many hard copy book publishers used different desktop publishing software programs to produce books. Examples of different programs include PageMaker, QuarkExpress, FrameMaker and others. These publishing programs enable books to be published as hard copy books by printing presses. These files are not in a form capable of electronically publishing the documents. The electronic document conversion computers 30 execute an algorithm which converts files from different desktop publishing programs into a standard electronic format.

Attachment 1 is a user guide which describes the process for creating an electronic book from a file obtained from a publisher. The user guide describes the steps a user, such as a person who is converting an electronic book file from a publisher's press-ready or pre-publication digital data format to a searchable electronic library format of the present invention, performs, using an embodiment of a document conversion tool to create an electronic book.

FIG. 19 shows a screen display illustrating the manifest created by an embodiment of the tool for converting a publisher's file to an electronic document file. The manifest lists all of the files composing the document.

FIG. 20 shows a screen display illustrating the list of tables included in a book created by an embodiment of the tool for converting a publisher's file to an electronic document file.

Thus it will be seen that the present invention provides a method and system for publishing a plurality of books for user access to information. The system includes selecting a plurality of books, converting each book from a layout or publication digital data form, e.g., by training a tool to detect characteristic features (such as layout, typeface, and hierarchical or organizational features such as chapter headings, captions, drawings and tables), and extracting text or data information of the book tagged with the features. This produces a searchable publishing database, e.g., arranged as an xml database indexed by book structure such that a user may remotely, e.g., over the internet or other network, access the database, search desired text data, and view an image of a portion of the book with the desired data. The system includes a user registration module to identify an authorized user, and may maintain a personal bookshelf for the user. Advantageously, a search engine may rank search results based on their position in the hierarchy or other factors, determining degree of relevance of text or data information located by the search engine. The other factors may include position of located search data in the hierarchy, identification of search data in the user's personal library, identification of search data in a prior search by the user or degree of match of data identified in the search.

The search engine may be a commercially available search engine together with an interface that operates, when provided a search query by a user, to search for an exact match and to score hits for relevance, and in the event an exact match is not found, operate to expand the query and return scored search results located by the search engine together with an indication of the expanded search such that the user may ascertain a degree of likely relevance of returned text or data information. The relational database may include hyperlinks to related section heading data, such that a user accessing a page of a book may immediately view related data and context of a page. The relational database is indexed by logical subunits of the book, so searches for boolean combinations or proximity of elements span page breaks of book text to identify all instances of the desired search data. The search engine may employ a graded or adaptive search strategy, and may expand a search if all hits have low ranking, and suppress hits of low ranking when the search produces hits of high ranking. In further embodiments, the search engine may be adapted to inspect the content of tables, formulae and drawings of the converted book files to further enhance utility of the system as a technical book library. This may be done, for example, by parsing data structures in appearing in a programming language, such as C that define the tables, formulae or drawings.

Attachment 2 is an example of a chapter of a book that has been converted to electronic form using the program described in the user guide of Attachment 1. Attachment 2 shows examples of the different tags applied by the program to identify different elements of the book.

One skilled in the art will appreciate further features and advantages of the invention based on the above-described embodiments. Accordingly, the invention is not to be limited by what has been particularly shown and described. All publications and references cited herein are expressly incorporated herein by reference in their entirety. 

1-25. (canceled)
 26. A method of publishing a plurality of books for user searching and access over a computer network, the method comprising: selecting a plurality of books, each book being in a layout form for hardcopy publication and not an electronic publishing format; automatically detecting features of the books, wherein detecting features includes extracting text from the book and determining hierarchical features of the books; tagging at least some of the text extracted from the books with determined hierarchical features relating to that text; storing on a computer network the tagged text; and providing a search engine to search over the computer network the tagged text, the search engine scoring search results at least in part based upon the hierarchical features with which the text is tagged.
 27. The method of claim 26, wherein the hierarchical features of the books includes chapters.
 28. The method of claim 27, wherein the hierarchical features of the books further includes sections and sub-sections.
 29. The method of claim 28, wherein the hierarchical features of the books further includes titles of the chapters, sections and sub-sections.
 30. The method of claim 27, wherein the hierarchical features of the books includes a plurality of lowest level elements, and the search engine searches the lowest level elements for one or more search terms entered by a user to calculate a relevancy ranking for each of the lowest level elements determined to be relevant by the search engine.
 31. The method of claim 30, wherein the relevancy rankings for lowest level elements are combined into a relevancy ranking for a next highest level element.
 32. The method of claim 26, further comprising presenting search results from the search engine to a user over the network.
 33. The method of claim 32, further comprising displaying to a user a portion of a book related to a search result that is selected by the user.
 34. The method of claim 26, wherein a computer tool is applied to automatically detect the features of the books.
 35. The method of claim 34, wherein the tool further detects characteristic features of a book including format of section headings, layout of text, drawings and captions and organizational features of the book.
 36. The method of claim 33, wherein the books are associated with a plurality of different publishers and further comprising monitoring access by users to books and determining royalties owed to the publishers associated with books that have been displayed to users.
 37. A publishing system operative on a plurality of books, each book being in a layout form for hardcopy publication and not an electronic publishing format, wherein the system comprises: a document conversion processor for converting the books in a layout form for hardcopy publication into an electronic publishing format; a detecting processor for automatically detecting hierarchical features of the books; an extracting processor for extracting text or data information of the books and for tagging the extracted text or data information of the books with the hierarchical features detected by the detecting processor; a memory for storing the tagged text or data information from the means for extracting in a publishing database, the memory comprising a searchable database of said plurality of books configured for a user to remotely access the database over a network or internet; and a search engine effective to search the memory for specified search data, and return an image of a portion of the book with the desired data; wherein the search engine ranks search results based on their position in the hierarchy to determine their degree of relevance of text or data information located by the search engine.
 38. The publishing system of claim 37, wherein the memory includes a publishing database having text or data information organized and related as chapter, section, subsection or other hierarchy, and the search engine scores search results based upon at least one factor selected from the set of factors including position of located search data in the hierarchy.
 39. The publishing system of claim 37, wherein the memory includes a publishing database having hyperlinks to related section heading data, such that a user accessing a page of a book may immediately view related data or context of a page.
 40. The publishing system of claim 37, wherein the memory includes a publishing database indexed by logical subunits of the books such that expanded searches for boolean combinations or proximity of elements span page breaks of book text to identify all instances of the desired search data.
 41. The publishing system of claim 37, wherein the hierarchical features of the books includes chapters.
 42. The publishing system of claim 41, wherein the hierarchical features of the books further includes sections and sub-sections.
 43. The publishing system of claim 42, wherein the hierarchical features of the books further includes titles of the chapters, sections and sub-sections.
 44. The publishing system of claim 41, wherein the hierarchical features of the books includes a plurality of lowest level elements, and the search engine searches the lowest level elements for one or more search terms entered by a user to calculate a relevancy ranking for each of the lowest level elements determined to be relevant by the search engine.
 45. The publishing system of claim 44, wherein the relevancy rankings for lowest level elements are combined into a relevancy ranking for a next highest level element.
 46. The publishing system of claim 37, further comprising a display for presenting search results from the search engine to a user over the network.
 47. The publishing system of claim 46, wherein the display displays to a user a portion of a book related to a search result that is selected by the user.
 48. The publishing system of claim 37, wherein the detecting processor further detects characteristic features of a book including format of section headings, layout of text, drawings and captions and organizational features of the book.
 49. The method of claim 37, further comprising a monitor and wherein the books are associated with a plurality of different publishers and monitor monitors access by users to books and determining royalties owed to the publishers associated with books that have been displayed to users. 